



MICROCOPY RESOLUTION TEST CHART
NATIONAL BUREAU OF STANDARDS-1963-A

# OTIC FILE COPY





DISTRIBUTION UNI IMITED

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

**VLSI PUBLICATIONS** 

VLSI Memo No. 87-360 January 1987

RELIC: A RELIABILITY SIMULATOR FOR INTEGRATED CIRCUITS

Teresa S. Hohol and Lance A. Glasser



| Accession For      |           |     |
|--------------------|-----------|-----|
| 1                  | GRA&I     | M   |
| DTIC               |           | 1   |
| Unannounced        |           |     |
| Justification      |           |     |
| <u> </u>           |           |     |
| Ву                 |           |     |
| Distribution/      |           |     |
| Availability Codes |           |     |
|                    | Avail and | /or |
| Dist               | Special   | ,   |
| 1                  | 1         |     |
|                    |           |     |
| 4-1                |           |     |
|                    |           |     |

Abstract

Many of the failure mechanisms which cause reliability problems in VLSI chips can be influenced or avoided in the circuit design phase. RELIC is a reliability simulator developed to analyze and predict the stress and wear on MOS VLSI chips due to such mechanisms. RELIC uses a simple methodology for abstracting the idea of the stress from any particular failure mechanism, thus allowing analyses of many different failure mechanisms. There are currently three failure mechanisms analyzed by RELIC: metal migration, hot-electron trapping, and time-dependent dielectric breakdown (TDDB).

Microsystems Research Center Room 39-321 Massachusetts institute of Technology

Cambridge Massachusetts 02139

Telephone (617) 253-8138

# Acknowledgements

Published in Proc. International Conference on Computer-Aided Design, Santa Clara, CA, pp. 517-520, November 11-13, 1986. This paper will appear (translated into Japanese) in Nikkei Microdevices. This work was supported in part by the Defense Advanced Research Projects Agency under contract no. N00014-80-C-0622.

Author Information

Hohol, current address: Hanscom Air Force Base, MA 01731; Glasser: Department of Electrical Engineering and Computer Science and Research Laboratory of Electronics, MIT, Room 36-880, Cambridge, MA 02139, (617) 253-4677.

Copyright (c) 1987, MIT. Memos in this series are for use inside MIT and are not considered to be published merely by virtue of appearing in this series. This copy is for private circulation only and may not be further copied or distributed, except for government purposes, if the paper acknowledges U. S. Government sponsorship. References to this work should be either to the published version, if any, or in the form "private communication." For information about the ideas expressed herein, contact the author directly. For information about this series, contact Microsystems Research Center, Room 39-321, MIT, Cambridge, MA 02139; (617) 253-8138.

# RELIC: A RELIABILITY SIMULATOR FOR INTEGRATED CIRCUITS

Teresa S. Hohol Lance A. Glasser

Department of Electrical Engineering and Computer Science Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge, MA, 02139

# ABSTRACT \*

Many of the failure mechanisms which cause reliability problems in VLSI chips can be influenced or avoided in the circuit design phase. RELIC is a reliability simulator developed to analyze and predict the stress and wear on MOS VLSI chips due to such mechanisms. RELIC uses a simple methodology for abstracting the idea of the stress from any particular failure mechanism, thus allowing analyses of many different failure mechanisms. There are currently three failure mechanisms analyzed by RELIC: metal migration, hot electron trapping, and time dependent dielectric breakdown (TDDB).

#### 1. Introduction

RELIC is a reliability simulator developed to analyze the stress and wear on MOS VLSI chips. RELIC is designed to help the circuit designer develop competitive and reliable chips with minimal extra effort. By using RELIC, chip designers can design, not only for worst-case speed and power Glasser 85, but also for worst-case reliability. RELIC is not built around any particular failure phenomena or model: rather, RELIC exists as a system in which many failure mechanisms can be simulated and new models easily implemented.

RELIC simulates those failure mechanisms which are under the control, or at least the influence, of the circuit designer. For such failure mechanisms, there exist models which relate the median time to failure (MTTF) of the device to the operating voltages and currents of the circuit and the actual layout of the chip. RELIC employs a circuit simulator so that the voltages and currents used in stress calculations are worst-case operating waveforms and not just the maximum voltages or currents.

The idea of reliability simulation is not a new one and software tools have been proposed to check circuit designs for certain reliability hazards, such as metal migration or hot electrons. A simulator to determine metal migration was described by Kokkonen Kokkonen 84. Substrate current circuit simulators have been described by Sing Sing 80 and Sakurai Sakurai 85, from which predictions about hot electron trapping can be made. Another hot electron simulator has been proposed by C. T. Sah Iversen 86, who is currently developing a hot electron

model from a combination of theory and experimental results.

While the simulation of failure mechanisms has been proposed previously, there are two major differences between earlier simulators and RELIC. First, RELIC calculates stress and wear based on actual dynamic voltages and currents determined by circuit simulation, and not on the basis of worst-case static operating conditions. In addition, RELIC is the first reliability simulator to run failure tests for several mechanisms. RELIC provides a structure in which existing models may be easily changed, and new models implemented with moderate effort. This is accomplished by the use of a simple methodology for handling the stress and wear caused by many different failure mechanisms.

### 2. Stress and Wear

One of the unifying concepts used in RELIC is that when a device accumulates a certain amount of stress over time (wear), it fails. This stress may be in the form of trapped charge in the gate oxide causing breakdown (TDDB), or in the form of a transistor threshold shift causing circuit malfunction. Regardless of which particular failure phenomena is being testing, we can abstract the idea of stress, and determine the MTTF of the device on the basis of stress and circuit configuration alone.

RELIC predicts the reliability of the circuit by first calculating the wear which each circuit device experiences over one normal cycle of operation. We define a normal cycle of operation to be the time it takes to run the circuitry through one routine of whatever it does. The wear w on each device is calculated as the integral of stress s over time.

$$w(t) \equiv \int_0^t s(t')dt'. \tag{1}$$

We assume a deterministic point of view which says that the amount of wear a device can stand until it fails is the critical value of wear  $w(T_{fail})$ .  $w(T_{fail})$  is a random variable with a mean and variance which must be determined from tests on fabricated devices. The critical value of wear may also depend on how and where a device is used in a circuit. For example, the amount of hot electron stress a device can endure without failure depends on its position and function in the overall circuit.

Once we know the critical value of wear (distribution) and the stress rate, we can find the time-to-failure,  $T_{\text{fail}}$ , where

$$T_{\text{fail}} = \frac{w(T_{\text{fail}})}{\langle s \rangle}, \tag{2}$$



<sup>\*</sup> Support for this research was provided by the Defense Advanced Research Projects Agency of the Department of Defense under contract number N00014-80-C-0622.

and where the average stress is

$$\langle s \rangle = \frac{1}{T_{\text{fail}}} \int_{0}^{T_{\text{fail}}} s(t) dt = \frac{w(T_{\text{fail}})}{T_{\text{fail}}}.$$
 (3)

Note that we are making the assumption that the stress rate, as determined for one normal cycle, will remain constant for the life of the circuit.

Another important assumption here is that the stress, s, is of the form

$$s(t) = A(t)e^{bE} \tag{4}$$

where A(t), b, and E, are quantities whose form varies for each failure model. E is assumed to be approximately constant over time. These assumptions for s are valid because the phenomena which we are considering all have exponential character in energy space. (See example of metal migration model later.) Given the fact that stress s depends exponentially on E, if E is a normal random variable then s is a log-normal variable.

Substituting (3) and (4) into (2), and taking the log of both sides, we then have

$$\ln T_{\text{fail}} = \ln w(T_{\text{fail}}) - \ln \left( \frac{1}{T_{\text{fail}}} \int_{0}^{T_{\text{fail}}} A(t) e^{bE} dt \right) \quad (5)$$

which reduces to

CONTRACTOR DESCRIPTION OF THE PROPERTY OF THE

$$\ln T_{(\gamma)} = \ln w(T_{(\gamma)}) - \ln A - bE. \tag{6}$$

Therefore, because E was a normal random variable, and s was consequently log-normal, then  $T_{\text{fail}}$ , which depends inversely on s also has a log-normal probability distribution. RELIC assumes log-normal distributions for all failure mechanisms.

This generalized model of stress is applicable for all of the failure models which are currently implemented in RELIC. A, b, and E can be determined from model and circuit parameters and circuit simulation. The critical wear value  $w(T_{\text{fail}})$  must be determined experimentally and, as mentioned earlier, could depend on a device's location and function in the circuit.

Once RELIC has computed the time-to-failure distribution for each device, it then combines the distributions for each device to obtain the total failure distribution for the entire circuit. Using the assumption that the failure of one device has no effect on the probability of failure of any other device, we then treat the failure probabilities as being independent. Given that, then the probability of system failure  $P_{\text{sysfail}}$  is  $P_{\text{sysfail}} = 1 - P_{\text{syswork}}$ , where Psyswork is the probability that the system has not failed. Assume that one device failure is enough to cause the

system to fail. Given an independent system, Psyswork is equal to the product of the probabilities that each device is working. Thus, Peyefail is found to be

$$P_{\text{sysfail}} = 1 - \prod (1 - P_{\text{fail}_i}). \tag{7}$$



In addition, if all of the individual probabilities for device failure are small, and the cross products of (7) even smaller, we may then use what is known as the Rare Event Approximation. Using this approximation, the probability of system failure becomes the linear sum of the individual failure probabilities for each device. That is,

$$P_{\text{sysfail}} \leq \sum_{i} P_{\text{fail}_i}.$$
 (8)

Note that this only works if there are a small number of devices  $n_{\text{dev}}$  such that  $n_{\text{dev}} \ll \frac{1}{P_{\text{full}}}$ .

# 3. System Structure and Implementation



Figure 1. Structure of the RELIC System

The RELIC simulator consists of three parts: a preprocessor, a modified circuit simulator, and a post processor, as illustrated in Fig 1. We use the circuit simulator both to simulate the circuit to determine voltage and current waveforms and to calculate device stress based on these conditions. The simulator in this implementation is RELAX2.2 from the University of California at Berkeley White 85]. What we have done, basically, is to introduce into RELAX some new device models, one for each failure mechanism we are simulating.



Figure 2. Reliability Test Structures

These new models, which I shall refer to as reliability test structures are connected to the circuit nodes of the device undergoing testing to measure its voltages and currents, and from these operating conditions, along with the device size and processing parameters, calculate the instantaneous stress on that device [Fig 2]. Besides sensing the circuit operating conditions, test structures also employ circuit nodes connected to various configurations



of resistors and correctors to calculate intermediate quantities. A final node, the wear node, is used to output the accumulated stress. This node is connected to a grounded capacitor and input a current proportional to the instantaneous stress, so that the voltage across this node represents the total wear incurred.

The RELIC preprocessor, PREL, takes an input file which describes the circuit according to RELAX syntax. This input file must also contain some RELIC commands indicating which devices to test, and for which failure mechanisms. PREL modifies this file to include the appropriate reliability test structures for each device undergoing testing. PREL also instructs the circuit simulator to output the voltage waveform for the new wear node. This new circuit configuration is then simulated by the circuit simulator for one normal cycle of operation and the wear incurred for this cycle is output. Finally, the RELIC postprocessor must compare this device wear information with critical failure wear data to determine the time-to-failure of individual devices and the MTTF for the entire circuit. The postprocessor is currently under development at the time of this writing.

# 4. Failure Models

The current implementation of RELIC contains models for metal migration. Kokkonen 84. hot electron trapping. Hsu 82., and time dependent dielectric breakdown. Chen 85. Presented here are the salient features of the metal migration model, which incorporates the effects of IR heating, thermal capacitance, and thermal resistance. The wire stress depends on the wire temperature and current waveforms.



Figure 3. Implementation of the Wire Model

The wire model is a 4-node reliability test structure. Two of the nodes. NI and N2, represent the ends of the wire and are connected to the circuit nodes on either end of the wire. (This feature is not well-supported by layout extractors, which consider a wire to be one node in the circuit; consequently, the locations of all wires the user wishes to simulate must be identified in the PREL input file.) RELIC models the electrical behavior of the wire as a pi model, having the wire's resistance between NI and N2 and half the wire capacitance from each of these nodes to ground Fig 3.

The wire model also employs a node, N3, for use in calculating an intermediate quantity. The stress on the wire is dependent on the wire temperature, which is a function of the thermal capacitance and resistance. The thermal equivalent RC circuit is also shown in [Fig 3], where the voltage on N3 represents the change in temperature of the wire.

The final node of the wire model is N4, which is the wear node. The current source, S2, is proportional to

the instantaneous stress calculated by the metal migration equations and, therefore, the voltage on N4 is proportional to the amount of metal migration stress on the wire.

The stress on the wire due to metal migration is roughly expressed as

 $s(t) = AJ^{2}(t)e^{(\frac{E_{+}}{k+1})} \tag{9}$ 

where A is related to the wire's physical size, J is the current density through the wire,  $E_i$  is the activation energy, k is Boltzman's constant, and T is the wire temperature. Note that this is consistent with our assumption earlier that the stress in our models was exponential in energy space. This is also true in the hot electron and TDDB models. Details on all of the equations and parameters used in RELIC can be found in Hohol 86.

The hot electron and TDDB models are implemented in basically the same way as the metal migration model. Both of these reliability test structures have sensing nodes which connect in parallel to the source, gate, drain, and bulk nodes of the transistor being tested. Because these test structures are inserted in parallel with the circuit device being tested, the user does not have to specify additional nodes in the circuit (unlike the wire model, which is inserted serially and requires two nodes instead of one).

# 5. Results: An Example



Figure 4. IBM Bootstrapped Superbuffer, Stressed

RELIC was used to analyze one version of an IBM Bootstrapped Superbuffer, shown in Fig. 4. This circuit uses two stages of bootstrapping to in order to drive a large capacitive load. However, the result of this double bootstrapping is a large amount of hot electron and TDDB stress on transistor M1.



Figure 5. Hot Electron and TDDB Wear on M1

When the input NI to the superbuffer is low and the bootstrap action has been completed, the source and gate of transistor MI are at 0 v and the drain N2 is around



12.5 v. This large voltage differential across the oxide between the gate N1 and the drain N2 creates large amounts of TDDB stress and wear (See [Fig 5]). When the input N1 to the superbuffer rises again, transistor M1 turns on, with an initial drain-to-source voltage of 12.5 v. This large saturation voltage generates hot electron stress and wear on the gate oxide of M1 (also in Fig 5).



Figure 6. Metal Migration Wear on Output Wire

In order to test RELIC on a metal migration simulation, I added a 2000  $\mu m$  long wire with a width of  $5 \mu m$  to the output node N3 of the superbuffer. As this wire was charged and discharged, the currents through the metal were found to cause metal migration wear. This wear is shown on the plot in Fig 6. Note that the metal migration wear increases and decreases as the current in the wire changes direction, but that the net wear appears to be in the negative direction. This does not mean that there is negative wear on the wire, but that the wear caused by the wire discharging is greater than the wear endured during charging. This is because the wire discharges faster than it charges, and consequently the currents during this period are greater in magnitude and cause more stress according to our simple model.



Figure 7. IBM Bootstrapped Superbuffer, Modified



Figure 8. Wear Plots of Modified Superbuffer

An alternative IBM Bootstrapped Superbuffer is shown in Fig. 7. This circuit has an added transistor M2,

which protects the drain of M1 from voltages higher than  $V_{dd} - V_{TH}$  (See [Fig 8]). The 12.5 v voltage drop is now spread across two transistors instead of one. Although the drain of M2 is still at 12.5 v, the gate of that transistor is at 5 v which reduces to voltage across the gate oxide to 7.5 v, which does not register any TDDB stress for the oxide thickness of 125 Å. Similarly, the approximately 8 v voltage drop from the drain to the source of M2 does not register any hot electron effects (See Fig 8). In order to reduce the metal migration stress, I increased the wire width to  $10\,\mu\text{m}$ . This causes the current density in the wire to decrease, and consequently, no metal migration wear is measured (also in Fig 8).

#### 6. Conclusions

In this paper we have presented RELIC, a reliability simulator for determining stress and wear on integrated circuits. RELIC employs a unifying strategy for abstracting stress from individual failure mechanisms, and therefore allows for the analysis of many different failure mechanisms. The heart of the RELIC simulator is a circuit simulator, employed both to determine instantaneous voltage and current waveforms and to carry out the actual equations for calculating stress due to failure mechanisms. The analyses carried out by RELIC on two versions of the IBM Bootstrapped Superbuffer show how this tool may be used by circuit designers to both identify problems and verify their correction.

#### References

[Chen 85] I.C. Chen, S.E. Holland, and C. Hu, "Electrical Breakdown in Thin Gate and Tunneling Oxides," IEEE Trans. Elec. Dev. 32: 413-422, 1985.



[Hohol 86] T. S. Hohol, "RELIC: A Reliability Simulator for Integrated Circuits," S.M. Thesis, Dept. of Electrical Engineering and Computer Science. Massachusetts Institute of Technology, 1986.

(Hsu 82 F. C. Hsu, P. K. Ko, S. Tam, C. Hu, and R. S. Muller. "An Analytical Breakdown Model for Short-Channel MOSFET's," *IEEE Trans. Elec. Dev* 29: 1735-1740, 1982.

[Iversen 86] W. R. Iversen, "Model May Help Solve Chip-Reliability Problem," *Electronics* pp. 20-21, March 24, 1986

[Kokkonen 84] K. Kokkonen, "The Interaction of Physics and CAD in VLSI CMOS Design," alk given at MIT Nov 1984.

[Sakurai 84] T. Sakurai, M. Kakumu, T. lizuka, "Hot-Carrier Suppressed VLSI with Submicron Geometry." *IEEE/ISSCC* 272-273, 1985.

[Sing 80] Y. W. Sing and B. Sudlow, "Modeling and VLSI Design Constraints of Substrate Current," *IEEE/IEDM* 732-734, 1980.

[White 85] J. K. White, and A. Sangiovanni-Vincentelli. "RELAX2.1 - A Waveform Relaxation Based Circuit Simulation Program," Proc. 1984 Int. Custom Integrated Circuits Conference, Rochester, N.Y. June, 1984.



# Computer-Aided Circuit Reliability Work at M.I.T.

Lance A. Glasser

January 6, 1987

The first formal M.I.T. course on transistors was taught in 1953. Dispite this early start, for most of the 70's the Institute maintained only a small integrated circuit research effort. In 1977 M.I.T. made a strategic decision to reemphasize microsystem technology and since that time the size of this program has increased dramatically, with new faculty being hired each year. The program is now large and vibrant, spanning research areas from nanometer structures to multiprocessor architectures. One of the many areas which has received recent attention is electrical issues in large digital machines, where a "large" machine is one whose physical dimensions are big compared to the distance a bit spans as it speeds across the machine. Within this area, five critical topics are being addressed: I/O, synchronization (e.g., clocking), power, noise, and reliability. It is this last topic which is addressed in the accompanying article.

RELIC is our first attempt to design a simulation program which enables the engineer to design high-performance circuits not only for worst-case speed, power, and noise margin, but also worst-case reliability. Our program is the first to support the reliability simulation of a circuit stressed by several dynamic reliability hazards.

As process and device technologies advance, the constraints that must be dealt with continually become more complex and difficult. Nevertheless, this complex constraint space is today reflected in the circuit domain as an orthogonal set of relatively simple rules. We do not believe that this simplicity can be maintained. In the future, product competitiveness will be determined, in part, by the ability of the circuit designer to design systems which simultaneously extract the maximum performance from critical devices while avoiding the edge of complex-shaped low-reliability regimes. This will be possible only with the use of high-quality reliability models and computer-aided design tools. RELIC is a demonstration of the sort of low-level design tools which will be necessary. (It is also worthwhile to ask about higher-level tools which aid reliability design.)

One of the limitations of RELIC is that reliability models are not sufficiently developed to predict the failure rate of a chip, even given exact process and circuit models. Nevertheless, it should be possible to compare two circuits for relative reliability and thereby guide the design of high-reliability parts. There is a second, less obvious, application. One commonly used technique for improving the reliability of a part is to do an accelerated burn-in to remove the weak devices, those which contribute to infant mortality. It is not always clear, however, how to accelerate the stress on a part because, though one may raise the power-supply voltage from 5 to 7 V, the voltages internal to the chip need not follow. (Consider voltages controlled by current mirrors or charge pumps.) One therefore needs to design for stressability. For high reliability applications one must be able to design a circuit so that accelerated burn-in can be accomplished. RELIC is suited to this task.

RELIC is a first-generation program. It is an experimental program written by Miss Terry Hohol on an experimental program (RELAX 2.2). It is therefore not surprising that while the program is operational, it is neither robust nor user friendly. But we have learned many things from RELIC. A second-generation program, now under development, will improve upon RELIC in five ways: (1) it will be based on a more solid circuit simulation program; (2) the models will be improved based on our better understanding of the literature; (3) the control structure will be modified so that it is easy to see the long-term effects of transconductance, threshold, and leakage degradation on circuit performance; (4) a post-processor will be added which predicts failure rates and cumulative percent fails in terms of the "wear" simulation variables. This means that one must model the MTTF and  $\sigma$  of as many as three statistical populations (main, freak, infant); and (5) we intend to make the second-generation program more robust and hence usable by the community.

Assuming that we can accomplish these tasks, a simulation program is still only as good as the models. Improved models are desperately needed. Even after all these years of research, metal migration is still not well understood. For instance, one can find in the literature papers that predict that pulsed operation is better, and worse, than steady-state operation. Hot carrier models are in reasonable shape for dc excitation but, again, the effects of trapping and de-trapping time constants on dynamic stress is unclear. Time-dependent dielectric breakdown is not well modeled even under dc excitation—from electric field data there appear to be at least two competing mechanisms—and pulsed dynamics and interactions with hot-carrier stressing are generally mysterious. Quality programs to do dynamic reliability simulation will soon exist. It is hoped that the reliability physics community will be able to meet the enormous challenge of quantifying the dynamics of device failure under stress so that these programs can accurately predict system reliability.

Coccocom Noosobby Beessees Processes Terrender -

S

Contract | Contract